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ABSTRACT 

This research project examined the relationship 
between measures cf speaker effectiveness obtained from rating scales 
and those obtained from objective comprehension tests of speech 
content. Iwo studies were used in order to provide independently 
derived results which could be compared. In the first study, 49 
undergraduate puMic-speaking students judged 6 speeches using both a 
modified Eaird-Knower rating scale and an objective comprehension 
test. Approximately half of the subjects listened to audio tapes of 
the speeches and half to video tapes with four of the six speeches 
used for final analysis. In the second study, 1190 students In 54 
basic speech classes each judged one speech using five rating scales 
and a three-item comprehnsion test. Results from these studies 
indicated that (1) relationships among ratings on individual scales 
were high, (2) comprehension measures correlated to a modest degree 
(first investigation only), and (3) neglibible relationships existed 
between ratings and comprehension scores. T.iese findings suggest that 
rating scales and comprehension scores are not measuring the same 
degrees and forns of speaker effectiveness. (JM) 
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TWO INVESTIGATIONS OF THE RELATIONSHIP AMONG 
SELECTED RATINGS OF SPEECH EFFECTIVENESS AND 
COMPREHENSION 



LARRY L, BARKER, ROBERT J. KIBLER and RUDOLPH W. CETER 



X 

A 

0 

Of 

0 

0 



!i) 

p 



R ATINGS of speakers' effectiveness 
have traditionally been used by 
researchers and classroom instructors to 
assess speaking ability. 1 These ratings 
are based on theoretically, empirically, 
and/or obscrvationally derived criteria 
which, it is assumed, reflect a valid 
measure of speaking skill. The use of 
such scales it based on the assumptions 
(i) that there is some absolute standard 
or model of excellence with which a 
given speech may be objectively com* 

Dr . Barker U Assistant Professor cf Speech end 
Assistant Director cf the Communication Rt . 
it etch Center, Defat (runt of Speech, Purdue 
Unk'trsUy. Dr. Kibter is Associate Professor of 
Speech and Associate Director, Communication 
Research Center, Department cf Speech, Purdue 
University. Mr, Ceter h Instnulor in Speech at 
the Purdue University Regional Campus, Fo*t 
Wayne, Indiana. 

The fir ti invetligttion here reported was 
supported coopentii'eh by the Educational Re* 
search Bureau, the Office of Research and Prof * 
eels, and the School of Communications at 
Southern Winds University. The second inves* 
tiga*ion comprises a portion of Mr. Octet's AM. 

t/nftynlfy, tp 6 yf. The inxusli* 
latots are indebted to Eugenia Hunter , David 
Petersen, and IIVUm*! SmtjA, 4ft cf Southern 
Winds University, for asdtlance in this research. 

I Tor tiamplct of Icscatch and trtkwr* of 
problems liLlctl to asking speaking ability 
with vatin^s >rc Samuel U netker, '"rite Rat* 
li>g of SfimTn*: Seale Tmle^vr>dencc. , * SAf, 

XXIX fUatth !</.*), jS 4 j: Samuel K Tinker 
• ml C jm I A. hallingcr, "“I nr of liutruc* 

iloual Mnl>otU upon Achk v* vmtu and AtiftmVt 
In tVwurmmkjuhin Skills" SM, XXVI! (March 
igfio), Rot^crl N. TVntroen, "iVigmaiurri, 

Rfftklliy, and Rating fkhavfc*," Speech Teacher, 
XIK (.Vortmber 10 $ 0 , iSjl&j; Keith Brook*, 
'Some Baric Constde rat tons In Ruing Scale 
Development: A Descriptive Bibliography," ten* 
fraf Sl4/ci Speech Journal, IX (fall 1937), ly-li; 
Theodore Ckvenget, Jt., "Influence erf Scale 
Cwnpktlty on the Reliability of Rating! of 
General fcjfectlreoess In Public Speaking." SSt, 
XXXI Hunt io6|>. *SS 1*6; Gerald ft. Miller, 
"Agreement and the Grounds fot It: IVrtnient 
Problem* In Speech Rating," S/*rrA Teachet, 
XIII (November tgf6«V **?•*$>- 



pared; (*) that the comparison between 
an objective standard and the speech 
under observation may be made in nu- 
merical terms on an interval scale mg- 
ing from effective to ineffective; (3) 
that actual ratings are primarily a func- 
tion of the stimulus (speech) rather 
than the internal subjective state of a 
competently trained judge of speaking. 

Some researchers have proposed u*c 
of behavioral measures derived from 
audience reactions to assess speech 
effectiveness. 1 Examples of such m?as* 
urcs include comprehension as deter- 
mined by an objective test; attitude 
change as determined by a shift-of- 
opinion ballot; observable actions such 
as voting, buying, donating blood or 
charitable contributions; physiological 
measures of changes as in heart rate, 
blood pressure, pupillary dilation, or 
palmar sweat. Investigators proposing 
such measures of a speaker's effective- 
ness have contended that effective 
speeches do not necessarily adhere to 
set theoretical standards yet change the 
bclnvior(s) of audiences in manners t!c* 
sited by the speakers. 

A first step tow, ml tfaiibing ihe>e 
matters Is to examine lioth rating* 
behavioral measures to detenmr* 
whether both arc measuring the uroe 

* Tor etample, see Paul D. Hottrmn, 
ett L Dunham, and Richard L Spann, “Di- 
rect Assessment ot FJTecttmvss ft! 
Speakers," The Journal cf Communication, XW 

a one 1966), Charles R. Crenet *r£ 

fanha W. Cmner, "Do Grades Awarded 0 ** 
room Speeches Indicate Cffeohmess of 
upon Audiences?" paper pre^mteRl at the 
Association ot Ametrca Ccnsmtion, Chios* 
Dmcmbet tX, 19CA. 
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degrees a oil forms of speakers’ effective* 
ness. The present investigations focus on 
this problem by providing comparative 
data regarding the relationship between 
selected rating scales and a measure of 
one behavior — comprehension. Two 

studies arc reported here. Different rat* 
j ing scales, comprehension measures, 
types of speeches, and subjects were used 
in the two investigations in order to 
, provide independently derived results 
: regarding the problem being examined. 
1 

! INVESTIGATION I. 

Procedure 

j Subjects 

Subject for the first investigation 
vert* randomly selected from available 
public speaking classes at Southern llli- 
nois University (N = 49). Participating 
subjects were inexperienced raters in 
: ihat they had received only general 
; class: 00m training in evaluation and 
> they had limited experience in evaluat- 
j ing speeches in the classroom. 

: Criterion yariables 

The variables under consideration 
vere (1) a comprehension test and (*) 
; a modified Baird-Knower rating scale.* 
; The comprehension test contained 
twenty-five multiple-choice and fill-in 
items over five of six speeches presented 
to subjects in a series. Content validity 
vas determined in the following man- 
ner. Manuscripts of the six speeches ac- 
companied by sixty questions (ten 
■ items per speech) were distributed to 
thirty graduate students. The graduate 
. uudents react each speech and then at- 
1 tempted to answer the questions abcut 
the speech. When answers were not tip. 
, parent from the first reading, they were 

* Set the original Baird-Krower rating sale, 
t*b)ub«l in A. Craig Baird and Franklin H. 
twwtr. Centra) Spttth, jrd ri. (New York, 
■Sty p. t(. 
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allowed to read through the manuscript 
again to find them. The tests completed 
by the graduate students were scored 
and test items for which answers were 
not identified by at least ninety percent 
of (he graduate students were discarded. 
The remaining items were those de- 
termined to be answerable by reading 
the speeches. It was inferred that the 
same information could be obtained 
through listening carefully to the 
speeches. A splil-h.'itf reliability estimate 
corrected by the • Spearman-Brown 
prophecy formula was found to be .36 
(N *= 56 hr basic speech course stu- 
dents), and the test was, consequently, 
judged sufficiently reliable for the pur- 
poses of the investigation. 

The Baird-Knower scale is an instru- 
ment frequently used fn classroom 
speech evaluation. Several modifications 
were made in the original Baird-Knower 
scale in the present investigation. (1) 
"Voice” and "Articulation,” which ap- 
pear as separate criteria on the original 
scale, were combined into one criterion 
requiring a single rating. (1) An "Au- 
dience Interest and Adaptation" scale 
. was added as a criterion to be rated on 
the modified scale, (j) "Physical Activ- 
ity" was eliminated as a scale because 
some subjects heard the speeches vft- 
audio tape. (4) Descriptive w*ords and 
comments which are listed under each 
criterion on the original Kale were 
changed from negative statements to 
positive statements on the modified 
Kale. (5) The 1-9 rating Kales used on 
the original Baird-Knower form were 
changed to 1-5 Kales for each variable. 
Thus, the following Kales were included 
on the modified evaluation form: speech 
attitudes and adjustments, voice and ar- 
ticulation, language, audience interests 
end adaptation, ideas, organiution, and 
general effectiveness. In addition, a total 
for these ratings was computed. 
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Stimulus Speeches 

A *rle* of six, three* to five-minute 
informative, video taped and audio 
taped speeches was shown to subjects. 
The speeches had been assessed in a 
previous investigation and judged to 
represent a wide range of speaking effec- 
tiveness. Nine faculty evaluators had 
judged two speeches to be above av- 
erage, two average, and two below av- 
erage. Tcst-retest reliability estimates on 
the Baird-Knower scales for a series of 
nine speeches (six used here plus three 
others), for nine faculty judges, ranged 
l/om .6* to .86. For eight of the nine 
scales reliability estimates were above 
.70, and five of the nine were above 
• 75 - 4 

The six speeches were recorded on 
audio and video tape in two different, 
randomly assigned orders with two- 
minute pauses between speeches. The 
pause allowed time for subjects to rate 
the speech before the next speech be- 
gan, thus reducing the possibility of an 
adverse "overlap” effect.* The orders of 
presentation and the two modes were 
used to control for possible order and/ 
or mode effects. Complete data were ob- 
tained for anat)sis for four of the six 
speeches presented. These were the 
speeches common to the two orders of 
presentation. Each speech omitted from 
the analysis appeared as the first speech 
in one of the two orders. 

Administration of Speeches and 
Evaluative tmiruments 

Two weeks prior to the beginning of 
the investigation, subjects were given 
sample copies of the modified Baird- 

• Robert I. Ribter, L»„y L. Barker, and Roy 
II. £rxfcti ‘The Dtvtloomeni and rrelimfnaty 
Attrwmen*. o r a Set of Video- ^ Taped Informative 
Speech Mode tv,” Central State » Speech /ovmaf, 
XVIII {November 1967). *68 ajj 

S Larry I. Barker, Robert I. KiMer, and 
f.ujtmia C. Hunter, "An F.mpmtal Study ot 
Overlap Rating Meets," Sptfth Teachtt, XVI! 
(M. reft 196*}, i©cmG6. 



Knower rating sealei and were instructed 
in their use by individual course in- 
structors. In most cases subjects were al- 
lowed to practice using the scales by 
rating their classmates during regular 
class speeches. 

The six speeches were presented to 
the subjects during a twcnlay period. 
On the day the speeches were presented, 
individual class instructors introduced 
a Research Associate, telling the subjects 
that the Associate was a member of the 
speech department attempting to assess 
the ability of students to evaluate 
speeches. Evaluation forms and instruc- 
tions were distributed by the Research 
Associate and the instructions for usin^ 
the rating scales were read aloud. One 
scries of six speeches was then presented 
by video or audio tape, approximate!) 
half of the subjects receiving the stim- 
ulus speeches by each mode of pre*en- 
t don. Immediately after being exposed 
to each s)>cceh in the series, subjects 
evaluated it on the modified Baird* 
Kuowcr rating form. At the conclusion 
of the entire series of speeches, evalua- 
tion forms were collected by the Rc* 
search Associate, The comprehension 
test (immediate post test) was next dis- 
tributed, and subjects were instructed 
to complete it. The test did not include 
questions on the f.rst speech in the 
scries. The orders of questions other- 
wise corresponded to the orders 
presentation, and questions pettainirg 
to a specific $|>cech were identified by a 
heading which provided a cue to the 
content of the rjKvsh (c.g„ "Ait De- 
fense Command”). Test Ixxddcls wet c 
collected nt the cm! of the session and 
subject* were told they would recent 
their test scores at a latct date. 

Thtce weeks after I tie initial admimV 
tration of treatments but before sub- 
jects learned of their semes on the inv 
mediate post test, the same roniprehetv 
ston test (delayed |>ost test) was ad- 
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niiimtcrcd to nil subjects. Students were 
(old by t licit' individual instructors that 
die test was to determine how much 
information bad been retained either 
as n result of initially viewing the 
speeches or taking the initial comprc* 
(tension test. 

Statistical Analysis 

Pearson product-moment correlation 
coefficients for ase with paired, un- 
grouped dila were computed among 
rating scales on the modified Baird* 
Knower form and the comprehension 
tests. 4 The result of the analysis was a 
ten by ten matrix of inter-variable corre- 
lations. 

The two different orders of present* 
ing the series of six speeches resulted 
in four of the last five speeches in each 
series being the same, though they were 
heard in different orders. Only data for 
the four common speeches were included 
in the analysis of speech comprehension 
and ratings. 

Results 

The results of the investigation are 
reportal in Table 1 and indicate that 
(1) there was a relatively high correla- 
tion among most scales or the modified 
Baird-Knower rating form (all r’s^s 

• j. P. Guilford. Fundemenlit Stethiia fa 
tryx'hvtogy end Fdutelion, 41 b ed. (New York. 
1*65), pp. 91-11 1. 



.G117); and (2) the correlations among 
the scales on the Baird-Knower form 
and either immediate or delayed com- 
prehension lest scores were so low (all 
but three r*$ ^ .17) as to suggest negligi- 
ble telationships exist among these 
variables. 

The study indicates, as has previous 
research, that most individual scales on 
the Baird-Knower rating form correlate 
highly with “General Effectiveness ’ and 
“Total Rating.’ 1 The scale which cor- 
related least with other scales was 
"Ideas," but the coefficients obtained 
were still relatively high. 

Immediate and delayed comprehen- 
sion tests correlated with each other to 
a modest degree. Information regarding 
the normal forgetting curve suggests an 
extremely high correlation should not 
be expected between these scores. The 
correlation obtained here supports this 

observation (r = .60). 

*■ 

INVESTIGATION It. 

Procedure 

Subjects 

Subject* were students (Nsango) 
enrolled in Purdue University** basic 
speech course and instructors (leaching 
assistants) lor 7* actions of the course. 
Subjects in experimental groups (N « 
898) w»re student* assigned by the regis- 
trar to 54 sections of the course; the 
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control • group consisted of students 
(N *= *9*) assigned to 18 different sec- 
tions. The instructors for the 54 sections 
served as experimental subjects; in- 
structors for the 18 different sections 
served as control subjects. In addition, 
54 students (from other than the ex- 
perimental or control sections) served 
as speakers to be evaluated by experi- 
mental groups. 

Criterion Variables 

Rating scales developed by Price 1 
vere modified for use in this investiga- 
tion. The modifications included delet- 
ing one scale (Is the speaker intelligi- 
ble?) and adding a "general effective- 
ness" Kale. This was done on the basis 
of Clevenger’s research.* Reliabilities 
for the Price scales in conjunction with 
general effectiveness have been reported 
by Clevenger (reliability coefficients 
ranged from .61 to .63 with a maximum 
of seven judges).* The following Kales 
were included on the rating form as it 
was used in this investigation. (1) Does 
the speaker sound reasonable? (*) Does 
the speaker communicate well through 
bodily action? (3) Is the speaker social- 
ly acceptable? (4) Docs the speaker use 
language vividly and imaginatively? (5) 
Does the speaker h.-ve a pleasing and 
expressive voice? (6) General, overall 
effectiveness. An average of these six 
Kales was also computed as a criterion 
measure. 

A three-item, multiple-choice, compre- 
hension test was developed for each of 
the 34 persuasive speeches. Items for a 
comprehension test on each speech were 
drawn from those submitted by the stu- 
dent speakers but were modified to meet 

» William K. Ptl«, ‘The Unlvtrtity of Wis- 
consin Sp«th Attainment Test,” unpub*., diss. 
(Unimnty cf Wisconsin, 1984). 

♦ Theodore Qetvngtr, Jr, "Influence of Scale 
Complexity on the Reliability ot Ratings of 
General Elfcctisenesa In Rubric SpeaVing," f 
til. 

* IHi. 



three criteria: (1) questions were to 
pertain to material at the beginning, 
middle,- and end bf'thc speech; (?) 
questions were to be phrased in multi, 
pie-choice form with five apparently 
reasonable choices (one correct answer 
and four foils); (3) the correct answers 
to the questions were to have been 
stated obviously in the speech and the 
language of the speech exactly dupti 
cated in each correct answer. 

Stimulus Speeches 

The student speakers were assigned to 
present their Speeches to one of 54 
sections (experimental groups). The 
speakers had receded minimal assist- 
ance from their course instructors in 
preparing persuasive speeches to be de- 
livered from manuKript as the sixth 
assignment in the course. Students re- 
ceived extra credit in their own classes 
for presenting the speeches to the ex- 
perimental groups and were informed 
that they were participating in a de- 
partment-wide evaluation program. 

For purposes ot the investigation, 
comprehension score was defined as the 
sum of right answers for the three test 
items administered in experimental 
groups after the speeches had been 
heard. A comparison of the com- 
prehension results for experimental 
groups (subjects who received iht 
speeches and took the comprehension 
test) and control groups (subjects who 
did not receive the speeches but mol 
the comprehension test) indicated th** 
the experimental groups comprehended 
significantly more information thar. 
subjects in the control groups. When 
instructors’ scores from experimental 
and control groups were compared by a 
t test, a significant I (^ .03 level ol con- 
fidence) of (df = 71) was obtained, 
indicating the instructors in the experi- 
mental groups comprehended signifi- 
cantly more information from the 
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speeches ih, in those in the control 
groups. Similar findings were obtained 
when students in experimental and con- 
trol groups were compared. A signifi- 
cant / .03 level of confidence) of 

11.312 (il( = iiS9) indicated that sub- 
jects in experimental groups compre- 
hended significantly more information 
from the speeches than those in the 
control groups. These t test results show 
that subjects receiving the speeches ob- 
tained significantly move information 
than those who did not receive the mes- 
sages. The results further indicate that 
the speeches contained information not 
generally available and that they were, 
in fact, informative. 

Administration of Speeches and 
Eiuluative Instruments 

Student speakers were instructed to 
report five to ten minute: early to the 
classroom where they were scheduled 
to speak. The instructor of the section 
lead instructions to each class at the 
beginning of the period. This informa- 
tion described the nature of the project 
on speech evaluation and indicated that 
the class had been selected as an "evalu- 
ation section" in the project. The stu- 
dents in the class were told that they 
vould evaluate the speaker following 
hit speech, and that their evaluations 
you ): 1 not affect the speaker's grade in 
any way but might affect the evaluation 
techniques applied to students who 
vould take the basic speech course in 
the future, brief instructions concern- 
ing the concepts of specific Kales on the 
ming form weie also presented. 

Each speaker was introduced by name, 
presented his speech, and left the room 
immediately afterward. The inilmctor 
then re-emphasited that (valuations 
king sought would be helpful in re- 
using the techniques used in the course 
kit would not affect the grade of the 
speaker. Following this reminder, the 
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instructor distributed two-page book- 
lets consisting of the general effective- 
ness scale, five specific rating Kales 
(from which a mean score was procured), 
and the comprchcnsion-tcst item:. The 
evaluating students and instructor com- 
pleted booklets which were then re- 
turned to one of the investigators after 
the class period ended. 

Statistical Analysis 

Pearson product-moment correlation 
coefficients for use with paired, un- 
grouped data were computed among the 
gencral-e/Tcclivcness rating scale, the av- 
erage (arithmetic mean) of the five rat- 
ing Kales on the modified Price rating 
form, and the comprehension test. The 
result of the analysis was a three-by- 
three inter-variable correlation matrix. 

Results 

The results of the investigation Indi- 
cate (1) that the correlations between 
the average of the five Kales and the 
general effectiveness Kale were rather 
high among both instructors (r «= .89) 
and students (r t* .96); (*) that the cor- 
relations between the average of the 
five scales on the modified Price form 
and comprehension-test sceres were low 
(r« — .06 for instructor and r=*.to 
for students); and (5) that the correla- 
tions between general-effectiveness rat- 
ings and comprehension-test scores were 
also low (r =a .005 for instructors and 
ra.to for students). This study shows, 
as did the first investigation, that the 
rating measures were highly intertor* 
related but that rating measures did not 
correlate meaningfully with the measure 
of comprehension. 

Discussion 

The results obtained from these two 
investigations indicate (t) that rela- 
tionships among ratings on the individ- 
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ual scales were reasonably high, (t) that 
the two comprehension measures cor ■ 
related to a modest degree (first in- 
vestigation only), but ($) that negligible 
relationships existed between ratings on 
the various Kales and the comprchen- 
sion measures. These findings are in* 
terpreted as indicating that the two 
types of criterion measures, rating scales 
and comprehension scores, are probably 
not measuring the same degrees and 
forms of speakers* effectiveness. 

If this observation is substantiated in 
subsequent research, it will be neces- 
sary for researchers to clarify the nature 
of the particular form of "speaker effec- 
tiveness 1 * which Ss appropriate for any 
given research problem. Furthermore, 
those using rating Kales for such pur- 
poses as assessing the electiveness of 
classroom speech behasior and contest 
speaking may wish to weigh such practi- 



cal concerns as convenience and case in 
using rating Kales against the more 
fundamental question of what is redly 
being measured by such ratings of a 
speaker's effectiveness. 

Additional research is required using 
different speech samples, different au- 
diences, and different types o f bclnvioi.il 
measures to ascertain whether the find- 
ings reported here are gcncralbable 
across other types of communicative 
events. Among the behavioral measures 
which .night be correlated profitably 
with ratings in future research are: not- 
ing, nonverbal behavior, attitude 
change, and various types of recall, h 
would also be well to explore the conse- 
quences f altering evaluators’ under- 
standings ol u*/iy they arc furnishing 
evaluations and the influences of ’set to 
rale” upon comprehension of messages. 



